-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Vector128/256.Equals via TestZ #55875
Conversation
Partially fixes #55343 since that also talks about optimizing AllBitsSet to a TestC. |
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128_1.cs
Outdated
Show resolved
Hide resolved
@tannergooding @dotnet/jit-contrib does it look good? |
Co-authored-by: Tanner Gooding <tagoo@outlook.com>
… into opt-vector-create
@dotnet/jit-contrib PTAL, a small improvement for Vector128/256.Create Failed CI job is unrelated (broken gcc build) |
GenTree* op2 = hw->gtGetOp2(); | ||
if (!gtIsActiveCSE_Candidate(tree)) | ||
{ | ||
if (op1->IsIntegralConstVector(0) && !gtIsActiveCSE_Candidate(op1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that IsIntegralConstVector
will only work for integer vectors, so all floating point vectors will be rejected here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Xor is not used in Equals
for floats and doubles so it's not a big deal
but it currently handles this (Xor for floats) just fine:
static Vector128<float> Foo(Vector128<float> v) =>
Sse.Xor(v, Vector128.Create(0).AsSingle());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for zero float it might be tricky as e.g. -0.0
can't be used in this optimization so is not worth the effort
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would it be tricky? You can account for -0.0
by checking the bitwise (rather than floating-point) value is 0
.
Speaking of 0.0
vs -0.0
, it looks like there might be existing bugs in IsFPZero
and IsSIMDZero
since they don't take this into account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. But would like @tannergooding to also approve.
@tannergooding PTAL |
} | ||
#endif | ||
|
||
// TODO: Enable substitution for CORINFO_HELP_TYPEHANDLE_TO_RUNTIMETYPE (typeof(T)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an existing issue for this TODO
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I want to merge it - to start experimenting with it 🙂 the issue is #40381
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I just wanted to make sure we had some github issue corresponding to the TODO, so its not just a comment we'll forget about
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM. Would be good to ensure floating-point is equally well handled.
This PR optimizes Vector128/256.Equals via the following improvements:
Xor(x, zero)
to justx
in morphCodegen diff: https://www.diffchecker.com/zJhfk9yq
According to benchmarks, it makes it 10-15% faster
jit-diff is quite small:
Superpmi: